Improving polygenic risk prediction from summary statistics by an empirical Bayes approach
نویسندگان
چکیده
Polygenic risk scores (PRS) from genome-wide association studies (GWAS) are increasingly used to predict disease risks. However some included variants could be false positives and the raw estimates of effect sizes from them may be subject to selection bias. In addition, the standard PRS approach requires testing over a range of p-value thresholds, which are often chosen arbitrarily. The prediction error estimated from the optimized threshold may also be subject to an optimistic bias. To improve genomic risk prediction, we proposed new empirical Bayes approaches to recover the underlying effect sizes and used them as weights to construct PRS. We applied the new PRS to twelve cardio-metabolic traits in the Northern Finland Birth Cohort and demonstrated improvements in predictive power (in R2) when compared to standard PRS at the best p-value threshold. Importantly, for eleven out of the twelve traits studied, the predictive performance from the entire set of genome-wide markers outperformed the best R2 from standard PRS at optimal p-value thresholds. Our proposed methodology essentially enables an automatic PRS weighting scheme without the need of choosing tuning parameters. The new method also performed satisfactorily in simulations. It is computationally simple and does not require assumptions on the effect size distributions.
منابع مشابه
Integrative genetic risk prediction using non-parametric empirical Bayes classification.
Genetic risk prediction is an important component of individualized medicine, but prediction accuracies remain low for many complex diseases. A fundamental limitation is the sample sizes of the studies on which the prediction algorithms are trained. One way to increase the effective sample size is to integrate information from previously existing studies. However, it can be difficult to find ex...
متن کاملMultiethnic polygenic risk scores improve risk prediction in diverse populations.
Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sample size or training data from the target population in small sample size, but...
متن کاملExplicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction.
Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could impro...
متن کاملDynamic Empirical Bayes Models and Their Applications to Longitudinal Data Analysis and Prediction
Empirical Bayes modeling has a long and celebrated history in statistical theory and applications. After a brief review of the literature, we propose a new dynamic empirical Bayes modeling approach which provides flexible and computationally efficient methods for the analysis and prediction of longitudinal data from many individuals. This dynamic empirical Bayes approach pools the cross-section...
متن کاملSmall Area Estimation of the Mean of Household\'s Income in Selected Provinces of Iran with Hierarchical Bayes Approach
Extended Abstract. Small area estimation has received a lot of attention in recent years due to necessity demand for reliable small area statistics. Direct estimator may not provide adequate precision, because sample size in small areas is seldom large enough. Hence, by employing models that can use auxiliary information and area effects in descriptions, one can increase the precision of direct...
متن کامل